NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dynamic Task Shaping for High Throughput Data Analysis Applications in High Energy Physics

https://doi.org/10.1109/IPDPS53621.2022.00041

Tovar, Ben; Lyons, Ben; Mohrman, Kelci; Sly-Delgado, Barry; Lannon, Kevin; Thain, Douglas (May 2022, IPDPS International Parallel and Distributed Processing Symposium)

Distributed data analysis frameworks are widely used for processing large datasets generated by instruments in scientific fields such as astronomy, genomics, and particle physics. Such frameworks partition petabyte-size datasets into chunks and execute many parallel tasks to search for common patterns, locate unusual signals, or compute aggregate properties. When well-configured, such frameworks make it easy to churn through large quantities of data on large clusters. However, configuring frameworks presents a challenge for end users, who must select a variety of parameters such as the blocking of the input data, the number of tasks, the resources allocated to each task, and the size of nodes on which they run. If poorly configured, the result may perform many orders of magnitude worse than optimal, or the application may even fail to make progress at all. Even if a good configuration is found through painstaking observations, the performance may change drastically when the input data or analysis kernel changes. This paper considers the problem of automatically configuring a data analysis application for high energy physics (TopEFT) built upon standard frameworks for physics analysis (Coffea) and distributed tasking (Work Queue). We observe the inherent variability within the application, demonstrate the problems of poor configuration, and then develop several techniques for automatically sizing tasks to meet goals of resource consumption, and overall application completion.
more » « less
Full Text Available
Matching in $$ pp\to t\overline{t}W/Z/h+ $$ jet SMEFT studies

https://doi.org/10.1007/JHEP06(2021)151

Goldouzian, Reza; Kim, Jeong Han; Lannon, Kevin; Martin, Adam; Mohrman, Kelci; Wightman, Andrew (June 2021, Journal of High Energy Physics)
null (Ed.)
A bstract In this paper, we explore the impact of extra radiation on predictions of $$ pp\to \mathrm{t}\overline{\mathrm{t}}\mathrm{X},\mathrm{X}=\mathrm{h}/{\mathrm{W}}^{\pm }/\mathrm{Z} $$ pp → t t ¯ X , X = h / W ± / Z processes within the dimension-6 SMEFT framework. While full next-to-leading order calculations are of course preferred, they are not always practical, and so it is useful to be able to capture the impacts of extra radiation using leading-order matrix elements matched to the parton shower and merged. While a matched/merged leading-order calculation for $$ \mathrm{t}\overline{\mathrm{t}}\mathrm{X} $$ t t ¯ X is not expected to reproduce the next-to-leading order inclusive cross section precisely, we show that it does capture the relative impact of the EFT effects by considering the ratio of matched SMEFT inclusive cross sections to Standard Model values, $$ {\sigma}_{\mathrm{SM}\mathrm{EFT}}\left(\mathrm{t}\overline{\mathrm{t}}\mathrm{X}+\mathrm{j}\right)/{\sigma}_{\mathrm{SM}}\left(\mathrm{t}\overline{\mathrm{t}}\mathrm{X}+\mathrm{j}\right)\equiv \mu $$ σ SMEFT t t ¯ X + j / σ SM t t ¯ X + j ≡ μ . Furthermore, we compare leading order calculations with and without extra radiation and find several cases, such as the effect of the operator $$ \left({\varphi}^{\dagger }i{\overleftrightarrow{D}}_{\mu}\varphi \right)\left(\overline{t}{\gamma}^{\mu }t\right) $$ φ † i D ↔ μ φ t ¯ γ μ t on $$ \mathrm{t}\overline{\mathrm{t}}\mathrm{h} $$ t t ¯ h and $$ \mathrm{t}\overline{\mathrm{t}}\mathrm{W} $$ t t ¯ W , for which the relative cross section prediction increases by more than 10% — significantly larger than the uncertainty derived by varying the input scales in the calculation, including the additional scales required for matching and merging. Being leading order at heart, matching and merging can be applied to all operators and processes relevant to $$ pp\to \mathrm{t}\overline{\mathrm{t}}\mathrm{X},\mathrm{X}=\mathrm{h}/{\mathrm{W}}^{\pm }/\mathrm{Z}+\mathrm{jet} $$ pp → t t ¯ X , X = h / W ± / Z + jet , is computationally fast and not susceptible to negative weights. Therefore, it is a useful approach in $$ \mathrm{t}\overline{\mathrm{t}}\mathrm{X} $$ t t ¯ X + jet studies where complete next-to-leading order results are currently unavailable or unwieldy.
more » « less
Full Text Available
Harnessing HPC resources for CMS jobs using a Virtual Private Network

https://doi.org/10.1051/epjconf/202125102032

Tovar, Benjamin; Bockelman, Brian; Hildreth, Michael; Lannon, Kevin; Thain, Douglas (January 2021, EPJ Web of Conferences)
Biscarat, C.; Campana, S.; Hegner, B.; Roiser, S.; Rovelli, C.I.; Stewart, G.A. (Ed.)
The processing needs for the High Luminosity (HL) upgrade for the LHC require the CMS collaboration to harness the computational power available on non-CMS resources, such as High-Performance Computing centers (HPCs). These sites often limit the external network connectivity of their computational nodes. In this paper we describe a strategy in which all network connections of CMS jobs inside a facility are routed to a single point of external network connectivity using a Virtual Private Network (VPN) server by creating virtual network interfaces in the computational nodes. We show that when the computational nodes and the host running the VPN server have the namespaces capability enabled, the setup can run entirely on user space with no other root permissions required. The VPN server host may be a privileged node inside the facility configured for outside network access, or an external service that the nodes are allowed to contact. When namespaces are not enabled at the client side, then the setup falls back to using a SOCKS server instead of virtual network interfaces. We demonstrate the strategy by executing CMS Monte Carlo production requests on opportunistic non-CMS resources at the University of Notre Dame. For these jobs, cvmfs support is tested via fusermount (cvmfsexec), and the native fuse module.
more » « less
Full Text Available
An analysis of reproducibility and non-determinism in HEP software and ROOT data

https://doi.org/10.1088/1742-6596/898/10/102007

Ivie, Peter; Zheng, Charles; Lannon, Kevin; Thain, Douglas (October 2017, Journal of Physics: Conference Series)

Full Text Available
Scaling up a CMS tier-3 site with campus resources and a 100 Gb/s network connection: what could go wrong?

https://doi.org/10.1088/1742-6596/898/8/082041

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Anampa, Kenyi Hurtado; Tovar, Benjamin; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas (October 2017, Journal of Physics: Conference Series)

Full Text Available
Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores

https://doi.org/10.1088/1742-6596/898/5/052036

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Anampa, Kenyi Hurtado; Yannakopoulos, Anna; Tovar, Benjamin; Donnelly, Patrick; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; et al (October 2017, Journal of Physics: Conference Series)

Full Text Available
A new calibration method for charm jet identification validated with proton-proton collision events at √s = 13 TeV

https://doi.org/10.1088/1748-0221/17/03/P03014

Tumasyan, Armen; Adam, Wolfgang; Andrejkovic, Janik Walter; Bergauer, Thomas; Chatterjee, Suman; Dragicevic, Marko; Escalante Del Valle, Alberto; Fruehwirth, Rudolf; Jeitler, Manfred; Krammer, Natascha; et al (March 2022, Journal of Instrumentation)

Abstract Many measurements at the LHC require efficient identification of heavy-flavour jets, i.e. jets originating from bottom (b) or charm (c) quarks. An overview of the algorithms used to identify c jets is described and a novel method to calibrate them is presented. This new method adjusts the entire distributions of the outputs obtained when the algorithms are applied to jets of different flavours. It is based on an iterative approach exploiting three distinct control regions that are enriched with either b jets, c jets, or light-flavour and gluon jets. Results are presented in the form of correction factors evaluated using proton-proton collision data with an integrated luminosity of 41.5 fb -1 at √s = 13 TeV, collected by the CMS experiment in 2017. The closure of the method is tested by applying the measured correction factors on simulated data sets and checking the agreement between the adjusted simulation and collision data. Furthermore, a validation is performed by testing the method on pseudodata, which emulate various mismodelling conditions. The calibrated results enable the use of the full distributions of heavy-flavour identification algorithm outputs, e.g. as inputs to machine-learning models. Thus, they are expected to increase the sensitivity of future physics analyses.
more » « less
Full Text Available

Search for: All records